64 research outputs found

    E-CLIP: Towards Label-efficient Event-based Open-world Understanding by CLIP

    Full text link
    Contrasting Language-image pertaining (CLIP) has recently shown promising open-world and few-shot performance on 2D image-based recognition tasks. However, the transferred capability of CLIP to the novel event camera data still remains under-explored. In particular, due to the modality gap with the image-text data and the lack of large-scale datasets, achieving this goal is non-trivial and thus requires significant research innovation. In this paper, we propose E-CLIP, a novel and effective framework that unleashes the potential of CLIP for event-based recognition to compensate for the lack of large-scale event-based datasets. Our work addresses two crucial challenges: 1) how to generalize CLIP's visual encoder to event data while fully leveraging events' unique properties, e.g., sparsity and high temporal resolution; 2) how to effectively align the multi-modal embeddings, i.e., image, text, and events. To this end, we first introduce a novel event encoder that subtly models the temporal information from events and meanwhile generates event prompts to promote the modality bridging. We then design a text encoder that generates content prompts and utilizes hybrid text prompts to enhance the E-CLIP's generalization ability across diverse datasets. With the proposed event encoder, text encoder, and original image encoder, a novel Hierarchical Triple Contrastive Alignment (HTCA) module is introduced to jointly optimize the correlation and enable efficient knowledge transfer among the three modalities. We conduct extensive experiments on two recognition benchmarks, and the results demonstrate that our E-CLIP outperforms existing methods by a large margin of +3.94% and +4.62% on the N-Caltech dataset, respectively, in both fine-tuning and few-shot settings. Moreover, our E-CLIP can be flexibly extended to the event retrieval task using both text or image queries, showing plausible performance.Comment: Jounal version with supplementary materia

    Deep Learning for Differentiating Benign From Malignant Parotid Lesions on MR Images

    Get PDF
    Purpose/Objectives(s)Salivary gland tumors are a rare, histologically heterogeneous group of tumors. The distinction between malignant and benign tumors of the parotid gland is clinically important. This study aims to develop and evaluate a deep-learning network for diagnosing parotid gland tumors via the deep learning of MR images.Materials/MethodsTwo hundred thirty-three patients with parotid gland tumors were enrolled in this study. Histology results were available for all tumors. All patients underwent MRI scans, including T1-weighted, CE-T1-weighted and T2-weighted imaging series. The parotid glands and tumors were segmented on all three MR image series by a radiologist with 10 years of clinical experience. A total of 3791 parotid gland region images were cropped from the MR images. A label (pleomorphic adenoma and Warthin tumor, malignant tumor or free of tumor), which was based on histology results, was assigned to each image. To train the deep-learning model, these data were randomly divided into a training dataset (90%, comprising 3035 MR images from 212 patients: 714 pleomorphic adenoma images, 558 Warthin tumor images, 861 malignant tumor images, and 902 images free of tumor) and a validation dataset (10%, comprising 275 images from 21 patients: 57 pleomorphic adenoma images, 36 Warthin tumor images, 93 malignant tumor images, and 89 images free of tumor). A modified ResNet model was developed to classify these images. The input images were resized to 224x224 pixels, including four channels (T1-weighted tumor images only, T2-weighted tumor images only, CE-T1-weighted tumor images only and parotid gland images). Random image flipping and contrast adjustment were used for data enhancement. The model was trained for 1200 epochs with a learning rate of 1e-6, and the Adam optimizer was implemented. It took approximately 2 hours to complete the whole training procedure. The whole program was developed with PyTorch (version 1.2).ResultsThe model accuracy with the training dataset was 92.94% (95% CI [0.91, 0.93]). The micro-AUC was 0.98. The experimental results showed that the accuracy of the final algorithm in the diagnosis and staging of parotid cancer was 82.18% (95% CI [0.77, 0.86]). The micro-AUC was 0.93.ConclusionThe proposed model may be used to assist clinicians in the diagnosis of parotid tumors. However, future larger-scale multicenter studies are required for full validation

    A clinically relevant online patient QA solution with daily CT scans and EPID-based in vivo dosimetry: A feasible study on rectal cancer

    Full text link
    Adaptive radiation therapy (ART) could protect organs at risk (OARs) while maintain high dose coverage to targets. However, there still lack efficient online patient QA methods. We aim to develop a clinically relevant online patient quality assurance (QA) solution for ART using daily CT scans and electronic portal imaging device (EPID)-based in vivo dosimetry. Ten patients with rectal cancer at our center were included. Patients' daily CT scans and portal images were collected to generate reconstructed 3D dose distributions. Contours of targets and OARs were recontoured on these daily CT scans by a clinician or an auto-segmentation algorithm, then dose-volume indices were calculated, and the percent deviation of these indices to their original plans were determined. This deviation was regarded as the metric for clinically relevant patient QA. The tolerance level was obtained using a 95% interval of the QA metric distribution. These deviations could be further divided into anatomically relevant or delivery relevant indicators for error source analysis. Finally, our QA solution was validated on an additional six clinical patients. In rectal cancer, the lower and upper tolerance of the QA metric for PTV {\Delta}D95 (%) were [-3.11%, 2.35%], and for PTV {\Delta}D2 (%) were [-0.78%, 3.23%]. In validation, the 68% for PTV {\Delta}D95 (%) and the 79% for PTV {\Delta}D2 ({%)of the 28 fractions are within tolerances of the QA metrics. By using four or more out-of-tolerance QA metrics as an action level, there were 5 fractions (18%) have four or more out-of-tolerance QA metrics in validation patient dataset. The online patient QA solution using daily CT scans and EPID-based in vivo dosimetry is clinically feasible. Source of error analysis has the potential for distinguishing sources of error and guiding ART for future treatments

    Importance-Driven Composition of Multiple Rendering Styles

    Get PDF
    International audienceWe introduce a non-uniform composition that integrates multiple rendering styles in a picture driven by an importance map. This map, either issued from saliency estimation or designed by a user, is introduced both in the creation of the multiple styles and in the final composition. Our approach accommodates a variety of stylization techniques, such as color desaturation, line drawing, blurring, edge-preserving smoothing and enhancement. We illustrate the versatility of the proposed approach and the variety of rendering styles on different applications such as images, videos, 3D scenes and even mixed reality. We also demonstrate that such an approach may help in directing user attention

    Distributed learning on 20 000+ lung cancer patients - The Personal Health Train

    Get PDF
    Background and purpose Access to healthcare data is indispensable for scientific progress and innovation. Sharing healthcare data is time-consuming and notoriously difficult due to privacy and regulatory concerns. The Personal Health Train (PHT) provides a privacy-by-design infrastructure connecting FAIR (Findable, Accessible, Interoperable, Reusable) data sources and allows distributed data analysis and machine learning. Patient data never leaves a healthcare institute. Materials and methods Lung cancer patient-specific databases (tumor staging and post-treatment survival information) of oncology departments were translated according to a FAIR data model and stored locally in a graph database. Software was installed locally to enable deployment of distributed machine learning algorithms via a central server. Algorithms (MATLAB, code and documentation publicly available) are patient privacy-preserving as only summary statistics and regression coefficients are exchanged with the central server. A logistic regression model to predict post-treatment two-year survival was trained and evaluated by receiver operating characteristic curves (ROC), root mean square prediction error (RMSE) and calibration plots. Results In 4 months, we connected databases with 23 203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. Summary statistics were computed across databases. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14 810 patients treated between 1978 and 2011 and validated on 8 393 patients treated between 2012 and 2015. Conclusion The PHT infrastructure demonstrably overcomes patient privacy barriers to healthcare data sharing and enables fast data analyses across multiple institutes from different countries with different regulatory regimens. This infrastructure promotes global evidence-based medicine while prioritizing patient privacy
    corecore